Exploration and Exploitation in Parkinson’s Disease: Computational Analyses
Authors
Affiliations
Björn Meder
Health and Medical University, Potsdam, Germany
Martha Sterf
Medical School Berlin, Berlin, Germany
Charley M. Wu
University of Tübingen, Tübingen, Germany
Matthias Guggenmos
Health and Medical University, Potsdam, Germany
Published
September 3, 2025
Code
# Housekeeping: Load packages and helper functions# Housekeepingknitr::opts_chunk$set(echo =TRUE)knitr::opts_chunk$set(message =FALSE)knitr::opts_chunk$set(warning =FALSE)knitr::opts_chunk$set(fig.align='center')options(knitr.kable.NA ='')packages <-c('gridExtra', 'BayesFactor', 'tidyverse', "RColorBrewer", "lme4", "sjPlot", "lsr", "brms", "kableExtra", "afex", "emmeans", "viridis", "ggpubr", "hms", "scales", "cowplot", "waffle", "ggthemes", "parameters", "rstatix", "magick", "grid", "cetcolor", "ggcorrplot")installed <- packages %in%rownames(installed.packages())if (any(!installed)) {install.packages(packages[!installed])}# Load all packageslapply(packages, require, character.only =TRUE)set.seed(0815)# file with various statistical functions, among other things it provides tests for Bayes Factors (BFs)source('statisticalTests.R')# Wrapper for brm models such that it saves the full model the first time it is run, otherwise it loads it from diskrun_model <-function(expr, modelName, path='brm', reuse =TRUE) { path <-paste0(path,'/', modelName, ".brm")if (reuse) { fit <-suppressWarnings(try(readRDS(path), silent =TRUE)) }if (is(fit, "try-error")) { fit <-eval(expr)saveRDS(fit, file = path) } fit}# Setting some plotting paramsw_box <-0.2# width of boxplot, also used for jittering points and lines line_jitter <- w_box /2xAnnotate <--0.3# jitter paramsjit_height <-0.01jit_width <-0.05jit_alpha <-0.6# colors for age groupsgroupcolors <-c("#d95f02", "#1b9e77", "#7570b3")choice3_colors <-c("#e7298a", "#66a61e", "#e6ab02")
1 Preamble
This document provides R code for the statistical analyses and plots of the behavioral data reported in the article
Meder, B., Sterf, M. Wu, C.M, & Guggenmos, M. (2025). Uncertainty-directed and random exploration in Parkinson’s disease. PsyArXiv
All analyses are fully reproducible, with the R code shown alongside the results, and random seeds set to ensure identical outputs across runs. Full session info is provided at the end of the document. All materials, including this document and all data, are available at:
There are two files with behavioral data: data_gridsearch_Parkinson.csv
The behavioral data are stored in
data_gridsearch_parkinson.csv, which contains the behavioral data from rounds 1-9 from the task
data_gridsearch_subjects.csv, which contains participant information.
These files are combined to data frame dat, which includes the following variables:
id: participant id
age is participant age in years
gender: (m)ale, (f)emale, (d)iverse
x and y are the sampled coordinates on the grid
chosen: are the x and y coordinates of the chosen tile
z is the reward obtained from the chosen tile, normalized to the range 0-1. Re-clicked tiles could show small variations in the observed color (i.e., underlying reward) due to normally distributed noise,\(\epsilon∼N(0,1)\).
z_scaled is the observed outcome (reward), scaled in each round to a randomly drawn maximum value in the range of 70% to 90% of the highest reward value
trial is the trial number (0-25), with 0 corresponding to the initially revealed random tile, i.e. trial 1 is the first choice
round is the round number (1 through 10), with 1=practice round (not analyzed) and 10=bonus round (analyzed only for bonus round judgments)
distance is the Manhattan distance between consecutive clicks. NA for trial 0, the initially revealed random tile
type_choice categorizes consecutive clicks as “repeat” (clicking the same tile as in the previous round), “near” (clicking a directly neighboring tile, i.e. distance=1), and “far” (clicking a tile with distance > 1). NA for trial 0, i.e., the initially revealed random tile.
previous_reward is the reward z obtained on the previous step. NA for trial 0, i.e., the initially revealed random tile.
last_ldopa: time of the last L-Dopa dose (HH:MM)
next_ldopa: scheduled time of the next L-Dopa dose (HH:MM)
time_exp: time of the experiment (HH:MM)
time_since_ldopa: time since last L-Dopa (in minutes)
File modelFits.csv contains the results of the computational model simulations (GP-UCB model and lesioned variants).
3 Computational Analyses: Gaussian Process Upper Confidence Bound (GP-UCB) Model
The behavioral analyses showed that individuals in a dopamine-depleted state exhibit a severe deficit in balancing exploration and exploitation. By contrast, the behavior of patients on medication was markedly improved and largely resembled controls. The increased exploration in patients off medication could result from more random choice behavior, an increased emphasis on uncertainty-directed exploration, or from impaired generalization reducing the ability to use information from one option to guide choices.
To disentangle these mechanisms, we used the Gaussian Process Upper Confidence Bound (GP-UCB) model (see Methods for formal specification). The model integrates similarity-based generalization with two distinct exploration mechanisms: uncertainty-directed exploration, which seeks to reduce uncertainty about rewards, and random exploration, which adds stochastic noise without being directed towards a particular goal (Wu et al., 2018; Wu et al., 2025). These processes are captured by three key parameters: the generalization parameter \(\lambda\), which determines how strongly rewards are generalized across options; the uncertainty bonus \(\beta\), which governs the degree of uncertainty-directed exploration by determining the value given to uncertainty; and the temperature parameter \(\tau\), which captures random exploration through choice variability.
In previous research using the same experimental paradigm, this model provided the best account of exploratory behavior in healthy participants (Giron et al., 2023; Meder et al., 2021; Schulz et al., 2019; Witt et al., 2024; Wu et al., 2018; Wu et al., 2020). Importantly, by decomposing exploration into generalization (\(\lambda\)), uncertainty-driven exploration (\(\beta\)), and random exploration (\(\tau\)), the model allows us to identify which mechanisms are altered by PD and medication. This approach builds on prior findings that levodopa impairs discrimination learning while sparing generalization in PD (Shohamy et al., 2006), that levodopa reduces directed exploration in healthy participants (Chakroun et al., 2020), and that PD disrupts the overall exploration-exploitation balance (Djamshidian et al., 2011; Gilmour et al., 2024; Seymour et al., 2016).
The GP-UCB model comprises three components: a learning model, which uses Bayesian inference to generate predictions about the rewards associated with each option (tiles); a sampling strategy, which uses both the reward expectations and the uncertainty about this expectation to evaluate options; and a choice rule, which maps the valuation of options onto choice probabilities (see Methods for formal specification).
3.1 Learning model: Gaussian Process generalization.
The similarity-based learning mechanism is implemented by a Gaussian Process (GP) (Schulz et al., 2018; Williams & Rasmussen, 2006), which learns an unknown spatial value function from noisy observations (e.g., a mapping from spatial location to expected reward). The amount of generalization depends on the similarity of options, where similarity is defined as spatial proximity: options that are closer to each other are assumed to be more alike (i.e., yield similar rewards) than options that are further away. The degree to which learning about one location influences reward expectations of other locations is governed by the parameter \(\lambda\), indicating how strongly a learner extrapolates from known rewards to other locations. Higher values of \(\lambda\) imply stronger generalization, whereas lower values correspond to weaker generalization.
Formally, a GP defines a probability distribution over functions mapping inputs to outputs \(f: \mathcal{X} \rightarrow Y\). In our case, these functions map grid locations \(\mathbf{x}\in \mathcal{X}\) to scalar reward observations \(y \in Y\), with the prior distribution taking the form of a multivariate Gaussian:
\[
f \sim \mathcal{GP}\big(m(\mathbf{x}),\, k(\mathbf{x}, \mathbf{x}')\big).
\tag{1}\]
The GP is fully specified by prior mean function \(m(\mathbf{x})\) defining the prior expectations of each input, and a kernel (covariance) \(k(\mathbf{x}, \mathbf{x}')\) encoding how strongly rewards at two locations are expected to covary as a function of their distance (Equation 2). Without loss of generality, we set the prior mean to zero (Williams & Rasmussen, 2006) and use the common radial basis function (RBF) kernel:
Here, \(\mathbf{x}\) and \(\mathbf{x}'\) are the coordinates of two tiles on the grid, and \(\lambda\) is the length-scale parameter, which governs the amount of generalization (i.e., the smoothness of the function). Higher values of \(\lambda\) imply smoother functions, leading to stronger expectations regarding reward correlations. Lower values of \(\lambda\) entail rougher functions, i.e. less correlation among similar options. %As \(\lambda \to \infty\), the RBF kernel assumes functions approaching linearity; as \(\lambda \to 0\), there ceases to be any spatial correlation, meaning that learning of options’ rewards happens independently. In our analyses, we treat \(\lambda\) as a free parameter representing the extent to which learners generalize rewards as function of spatial proximity.
To compute posterior predictions for any target location \(\mathbf{x}_\star\), we condition the model on a set of observations \(\mathcal{D}_t=\{X_{t}, \textbf{y}_{t}\}\) of choices \(X_{t} = [\mathbf{x}_1, \ldots \mathbf{x}_t]\) and corresponding reward observations \(\mathbf{y}_t = [y_1, \ldots, y_t]\) at time \(t\). This posterior also takes the form of a multivariate Gaussian:
Here, \(\mathbf{k}_\star=[k(\mathbf{x}_1,\mathbf{x_\star}), \ldots, k(\mathbf{x}_t,\mathbf{x_\star})]\) is the vector of kernel similarities between past observations and the target location, \(K_{X,X}\) is a matrix of pairwise kernel similarities between all past observations in \(X_t\), \(I\) is a \(t \times t\) identity matrix, and \(\sigma_\epsilon^2\) is the observation noise capturing the stochasticity of reward observations and is fixed to the true reward variability of each arm of the bandit \(\sigma_\epsilon^2=.0001\).
3.2 Sampling strategy: Balancing rewards and uncertainty
Options are valued according to Upper Confidence Bound (UCB) sampling, which considers both reward expectations and the associated uncertainty (Auer, 2002). UCB sampling implements a form of uncertainty-directed exploration that balances exploiting high rewards and seeking information. Uncertainty is valued positively to promote exploration of underexplored options, with the strength of this uncertainty bonus represented by the parameter \(\beta\).
where the expected reward of an option \(m(\mathbf{x}|\mathcal{D}_t)\) captures its exploitation value, and the scaled uncertainty\(\beta\sqrt{v(\mathbf{x}_\star|\mathcal{D}_t)}\) captures its exploration value, with \(\beta\) modulating how much exploration is promoted relative to exploitation. Higher values of \(\beta\) reflect a stronger drive to explore uncertain options, while lower values reflect a preference for exploiting known high-reward options.
3.3 Choice rule: Translating value into action
After computing UCB values for each option, the model does not always pick the most valuable one. Instead, it samples probabilistically using a softmax choice function, which adds random decision noise to the choice process:
The amount of randomness in the choice probabilities is governed by the temperature parameter\(\tau\). Higher values of \(\tau\) make the choice probabilities more uniform, such that the choice behavior is less influenced by options’ UCB values and more random. Lower value of \(\tau\) imply that the learner is more sensitive to options’ UCB values, making them increasingly likely to be selected. In the limits, if \(\tau \rightarrow 0\), choice behavior reduces to a greedy mean policy that always selects the option with the highest value (pure exploitation), and if \(\tau \rightarrow \infty\) all options are chose with equal probability (pure exploration). Here, we treat the temperature parameter \(\tau\) as a computational marker of a learner’s tendency to explore randomly, i.e., in an undirected fashion through inherent decision noise. Higher values of \(\tau\) correspond to more random exploration, and lower values to more deterministic choice behavior.
3.3.1 GP-UCB model parameters
Associated with each model component is a free parameter that we estimate through out-of-sample cross validation. These parameters provide a window into distinct aspects of learning and exploration:
The length-scale parameter \(\lambda\) of the RBF kernel (Equation 2) captures how strongly a participant generalizes based on the observed evidence, i.e., the rewards obtained from previous choices.
The uncertainty bonus \(\beta\) in the UCB valuation of options (Equation 7) represents to the level of directed exploration, i.e., how much expected rewards are inflated through an “uncertainty bonus”.
The temperature parameter \(\tau\) of the softmax choice rule (Equation 8) corresponds to the amount of sampling noise, i.e., extent of random exploration. # Model comparison
4 Model comparison
We tested the GP-UCB model in its ability to model learning and predicting each participants’ search and decision-making behavior. To assess the contribution of each component of the model (generalization, uncertainty-directed exploration, and random exploration) we compare the predictive accuracy of the GP-UCB model to model variants where we lesion away each component.
4.1 Lesioned models
To establish that all components of the GP-UCB model are required to explain behavior, we implemented three lesion variants of the model (Giron et al., 2023).
4.1.1\(\lambda\) lesion model
The \(\lambda\) lesion model removes the ability to generalize, such that options’ rewards are learned independently via a Bayesian Mean Tracker (BMT). The BMT is a Kalman filter with time-invariant rewards (Dayan et al., 2000; Wu et al., 2022), and as such can be interpreted as a Bayesian variant (Gershman, 2015) of the classic Rescorla-Wagner (Rescorla & Wagner, 1972) or Q-learning models (Watkins & Dayan, 1992). Intuitively, reward estimates are updated as a function of prediction error, where the learning rate is dynamically defined based on the degree of uncertainty of the model.
Like the GP, the BMT also assumes a Gaussian prior distribution of reward expectations, but does so independently for each option \(\mathbf{x}\):
\[
p \big(r_0(\mathbf{x})\big) \sim \mathcal{N}\big(m_0(\mathbf{x}),v_0(\mathbf{x})\big)
\tag{9}\]
where \(m_0(\mathbf{x})=0\) as in the GP, and we set \(v_0(\mathbf{x})=5\) following (Giron et al., 2023).
The BMT then computes a posterior distribution of the expected reward for each option, also in the form of a Gaussian, but where the posterior mean \(m_t(\mathbf{x})\) and posterior variance \(v_t(\mathbf{x})\) are defined independently for each option and computed by the following updates:
Both updates use \(\delta_t(\mathbf{x})=1\) if option \(\mathbf{x}\) was chosen on trial \(t\), and \(\delta_t(\mathbf{x})=0\) otherwise. Thus, the posterior mean and variance are only updated for the chosen option. The update of the mean is based on the prediction error \(y_t(\mathbf{x})-m_t(\mathbf{x})\) between observed and anticipated reward, while the magnitude of the update is based on the Kalman gain \(G_t(\mathbf{x})\):
analogous to the learning rate of the Rescorla-Wagner or Q-learning models. Here, the Kalman gain is dynamically defined as a ratio of variance terms, where \(v_{t}(\mathbf{x})\) is the posterior variance estimate and \(\theta_\epsilon^2\) is the error variance, which we treat as a free parameter and can be interpreted as an inverse sensitivity parameter. Smaller values of \(\theta_\epsilon^2\) thus result in larger updates of the mean.
4.1.2\(\beta\) lesion model
The \(\beta\) lesion model evaluates options solely based on their expected rewards, corresponding to a mean-greedy (MG) sampling strategy, and is implemented by setting the uncertainty bonus to \(\beta = 0\) (see Equation 7). Effectively, this equates the value of options with their posterior mean \(\text{MG}(\mathbf{x}) = m(\mathbf{x}|\mathcal{D}_t)\).
4.1.3\(\tau\) lesion model
The \(\tau\) lesion model replaces the softmax choice function (see Equation 8) with an \(\epsilon\)-greedy policy as an alternative mechanism for random exploration. Under this policy, with probability \(\epsilon\) a random option is selected and with probability \(1-\epsilon\), the option with the highest UCB value is chosen:
\[
p(\mathbf{x}) =
\begin{cases}
\text{arg,max}, \text{UCB}(\mathbf{x}), & \text{with probability } 1-\epsilon \
1/64, & \text{with probability } \epsilon
\end{cases}
\tag{13}\]
with the parameter \(\epsilon\) estimated individually for each participant.
4.2 Model cross validation
Models’ predictive accuracy was assessed using leave-one-round-out cross-validation based on maximum likelihood estimation , with parameter bounds set to the range \([\exp(-5), \exp(4)]\). Specifically, we iteratively held out one round from the task, fitted each model to the remaining seven rounds, and then tested its ability to predict participants’ choices on the 25 trials of the holdout round. Predictive accuracy was quantified as the sum of negative log-likelihoods across all out-of-sample predictions. Individual parameter estimates for participants are based on averaging over the cross-validated maximum likelihood estimates.
The negative log-likelihoods served as the model evidence for the hierarchical Bayesian model selection based on protected exceedance probabilities (Rigoux et al., 2014), and for quantifying predictive accuracy using a pseudo-\(R^2\) measure, where the summed log loss of each model is compared to a random baseline model. Accordingly, \(R^2=0\) corresponds to chance performance and \(R^2=1\) corresponds to theoretically perfect predictions:
Participant classification was based on which model had the highest \(R^2\) (or, equivalently, lowest log loss). We additionally performed a model comparison on the group level using the \(R^2\) measure. Consistent with the hierarchical Bayesian model selection and participant classification, the GP-UCB model achieved the highest \(R^2\) in each group (SI).
4.3 Bayesian hierarchical model selection (pxp)
For group-level model selection we computed protected exceedance probabilities (pxp), which quantify the probability that a given model is more frequent in the population than all competing models (Rigoux et al., 2014). In each group, the GP-UCB model outperformed all other models (?@fig-model-comparison-pxp).
Models’ predictive accuracy was assessed using a pseudo-\(R^2\) measure, based on the sum of negative log-likelihoods across all out-of-sample predictions. The summed log loss is compared to a random model, such that \(R^2=0\) corresponds to chance performance and \(R^2=1\) corresponds to theoretically perfect predictions.
\[
R^2 = 1 - \frac{\log \mathcal{L}(M_k)}{\log\mathcal{L}(M_{rand})},
\] In each group, the GP-UCB model had the highest predictive accuracy (Section 6).
4.4 Model-based classification of participants
Code
# get subject informationgroupDF <-dat_sample %>%select(id, age, gender,group,BDI,MMSE,hoehn_yahr,last_ldopa,next_ldopa,time_exp,time_since_ldopa) %>%group_by(id) %>%slice_head(n =1) %>%arrange(group)# groupDF <- dat %>% # group_by(id) %>%# slice(1) %>%# ungroup()# length(unique(dat$id))# length(unique(groupDF$id))modelFits <-merge(modelFits, groupDF[,c('id', 'group')], by ="id") # merge to add group # write individual model fits by group# write_csv(modelFits, "modelResults/modelFits_group.csv")# write_csv(subset(modelFits, group == "Control"), "modelResults/modelFits_control.csv")# write_csv(subset(modelFits, group == "PD-"),"modelResults/modelFits_PD_minus.csv")# write_csv(subset(modelFits, group == "PD+"), "modelResults/modelFits_PD_plus.csv")# kernels <- c("RBF", "BMT") # RBF = Radial Basis Function kernel, BMT= Bayesian Mean Tracker# acqFuncs <- c("GM", "UCB", "EG") # UCB = Upper Confidence Bound, GM=greedyMean, EG = epsilonGreedymodelFits <- modelFits %>%mutate(kernel=factor(kernel, levels=c('RBF', 'BMT'), labels=c('GP', 'BMT'))) %>%mutate(acq=factor(acq, levels=c('UCB', 'GM','epsilonGreedy'), labels=c('UCB', 'meanGreedy', 'epsilonGreedy')))modelFits$ModelName =paste(modelFits$kernel, modelFits$acq, sep="-")# Only include key comparisonsmodelFits <-subset(modelFits, ModelName %in%c("GP-UCB", "BMT-UCB", "GP-meanGreedy", "GP-epsilonGreedy" ))modelFits$ModelName =factor(modelFits$ModelName, levels =c('GP-UCB', 'BMT-UCB', 'GP-meanGreedy', 'GP-epsilonGreedy'))#Two line name for modelsmodelFits$shortname <-factor(modelFits$ModelName, labels =c('GP\nUCB', 'lambda\nlesion', 'beta\nlesion', 'tau\nlesion'))levels(modelFits$shortname) <-c('GP\nUCB', 'lambda\nlesion', 'beta\nlesion', 'tau\nlesion')
Code
# classify participants according to model R^2df_participant_classification <- modelFits %>%group_by(id) %>%slice_max(order_by = R2, n =1) %>%select(id, group, ModelName, shortname, R2) %>%ungroup() %>%rename(best_ModelName = ModelName,best_shortname = shortname,best_R2 = R2)df_counts <- df_participant_classification %>%count(group, best_shortname)df_percent <- df_counts %>%group_by(group) %>%mutate(total_in_group =sum(n),percent =round((n / total_in_group) *100, 1) ) %>%ungroup()# add most predictive model for each subject to df modelFitsmodelFits <- modelFits %>%left_join(df_participant_classification, by =c("id", "group"))
We classified participants based on which model achieved the highest cross-validated predictive accuracy (highest \(R^2\); ?@fig-participant-classification). In each patient group, the GP-UCB model was the most predictive model for the majority of participants (Control: 55.9%, PD+: 57.6%, PD-: 58.1%).
In total, out of 98 participants, 56 (57.1%) were best described by the GP-UCB model, 22 (22.4%) by the lambda lesion model, 13 (13.3%) by the beta lesion model, and 7 (7.1%) by the tau lesion model. The results suggest that all three components of the GP-UCB model are relevant for predicting participants’ behavior.
To better understand the mechanisms underlying the observed behavioral differences, we analyzed the parameters of the Gaussian Process Upper Confidence Bound (GP-UCB) model (Figure 1).
5.0.1 Generalization \(\lambda\)
The parameter \(\lambda\) represents the length-scale in the RBF kernel, which governs the amount of generalization, i.e., to what extent participants assume a spatial correlation between options (higher \(\lambda\) = stronger generalization). Overall, the amount of generalization was very similar between groups.
Control vs. PD+: \(U=678\), \(p=.145\), \(r_{ au}=.15\), \(BF=.65\)
Control vs. PD-: \(U=731\), \(p=.007\), \(r_{ au}=.28\), \(BF=3.9\)
PD+ vs. PD-: \(U=626\), \(p=.126\), \(r_{ au}=.16\), \(BF=.44\)
5.0.2 Exploration bonus \(\beta\)
The parameter \(\beta\) represents the uncertainty bonus, i.e. how much expected rewards are positively inflated by their uncertainty (higher \(\beta\) = more uncertainty-directed exploration). Controls and PD+ patients on medication did not differ, and both groups had lower beta estimates than the dopamine-depleted patients in the PD− group. These differences suggest that levodopa medication modulated the amount of uncertainty-directed exploration by restoring beta to levels comparable to those observed in controls without PD. This aligns with findings from a restless bandit paradigm, where L-Dopa reduced the amount of directed exploration in healthy volunteers, while the level of random exploration remained unaffected (Chakroun et al., 2020).
Control vs. PD+: \(U=480\), \(p=.315\), \(r_{ au}=-.10\), \(BF=.48\)
Control vs. PD-: \(U=188\), \(p<.001\), \(r_{ au}=-.46\), \(BF=81\)
PD+ vs. PD-: \(U=220\), \(p<.001\), \(r_{ au}=-.41\), \(BF=25\)
5.0.3 Random exploration \(\tau\)
The parameter \(\tau\) represents the amount of decision noise, i.e. stochastic variability in the softmax decision rule (lower \(\tau\) = more decision noise, i.e. more uniform distribution; conversely, \(\tau \rightarrow \infty \quad \Rightarrow \quad \text{argmax (greedy)}\)). There were no group differences in rge temperature paramter \(\tau\), indicating comparable amounts of random exploration regardless of group.
Control vs. PD+: \(U=572\), \(p=.896\), \(r_{ au}=.01\), \(BF=.25\)
Control vs. PD-: \(U=500\), \(p=.730\), \(r_{ au}=-.04\), \(BF=.27\)
PD+ vs. PD-: \(U=470\), \(p=.584\), \(r_{ au}=-.06\), \(BF=.28\)
5.1 Model simulations
To evaluate how well different parameter settings balance exploration and exploitation, we conducted simulations with the GP-UCB model. In these simulations, we fixed the value of \(\lambda\) at 1, corresponding to the true amount of correlation in the used environments, and systematically varied the amount of random exploration (\(\tau\)) and the size of the uncertainty bonus (\(\beta\)). For each parameter we defined used equally log-spaced values, and then simulated 100 learners searching for rewards. Environments were sampled (with replacement) from the set of 40 environments used in the empirical study.
Statistical analyses were performed using R. We report both frequentist and Bayesian statistics, using Bayes factors (BF) to quantify the relative evidence of the data in favor of the alternative hypothesis (\(H_1\)) over the null (\(H_0\)). All data and code required for reproducing the statistical analyses and figures are available at ADD GITHUB or OSF LINK.
For parametric group comparisons, we report (paired or independent) Student’s t-tests (two-tailed). For non-parametric comparisons we used the Mann-Whitney U test or Wilcoxon signed-rank test. Bayes factors for the t-tests were computed with the package (Morey & Rouder, 2024), using its default settings. Bayes factor for rank tests were computed following (Doorn et al., 2020).
Linear correlations were assessed using Pearson’s \(r\), with the Bayes factors computed with the BayesFactor package (Morey & Rouder, 2024), using its default settings. Bayes factors for rank correlations quantified with Kendall’s tau were computed using an implementation from Doorn et al. (2018).
Figure 3: Predictive accuracy of GP-UCB model and lesioned variants.
6.2.2 Model comparison \(R^2\): Control
GP-UCB vs. lambda lesion: \(t(33)=3.0\), \(p=.005\), \(d=0.2\), \(BF=7.5\)
GP-UCB vs. beta lesion: \(t(33)=3.4\), \(p=.002\), \(d=0.2\), \(BF=19\)
GP-UCB vs. tau lesion: \(t(33)=7.7\), \(p<.001\), \(d=0.6\), \(BF>100\)
6.2.3 Model comparison \(R^2\): PD+
GP-UCB vs. lambda lesion: \(t(32)=3.4\), \(p=.002\), \(d=0.4\), \(BF=20\)
GP-UCB vs. beta lesion: \(t(32)=3.7\), \(p<.001\), \(d=0.4\), \(BF=40\)
GP-UCB vs. tau lesion: \(t(32)=8.5\), \(p<.001\), \(d=0.9\), \(BF>100\)
6.2.4 Model comparison \(R^2\): PD-
GP-UCB vs. lambda lesion: \(t(30)=3.6\), \(p=.001\), \(d=0.7\), \(BF=27\)
GP-UCB vs. beta lesion: \(t(30)=5.4\), \(p<.001\), \(d=1.1\), \(BF>100\)
GP-UCB vs. tau lesion: \(t(30)=4.9\), \(p<.001\), \(d=1.0\), \(BF>100\)
7 Relations of model parameters to performance
We assessed the correlation (Kendall’s tau, because it’s invariant against log transformation) of GP-UCB parameter estimates with performance (mean reward).
Code
# mean reward per subject across all trials and rounds (practice and bonus round excluded)df_mean_reward_subject <- dat %>%filter(trial !=0& round %in%2:9) %>%# exclude first (randomly revealed) tile and practice round and bonus roundgroup_by(id) %>%summarise(group =first(group),sum_reward =sum(z),mean_reward =mean(z), sd_reward =sd(z)) df_params_performance <- df_gpucb_params %>%left_join(df_mean_reward_subject, by =c("id", "group"))df_params_performance_wide <- df_gpucb_params %>%pivot_wider(names_from = param, values_from = estimate ) %>%left_join(df_mean_reward_subject, by =c("id", "group"))
The amount of generalization was positively related with obtained rewards, showing that participants who successfully learned about the spatially correlation of rewards performed better. The uncertainty bonus \(\beta\) was negatively correlated with performance, demonstrating that an overreliance on exploration impairs efficient reward accumulation. The amount of random temperature \(\tau\) was not related to obtained rewards.
Figure 4: Correlation of GP-UCB parameters with obtained mean reward across all trials and rounds. Each dot is one participant. The insets show the correlations for a restricted parameter range from 0 to 1.
7.1 Generalization \(\lambda\)
Overall, the extent of generalization was positively related to performance, suggesting that participants who stronger generalized obtained more rewards:
Overall: \(r_{ au}=.26\), \(p<.001\), \(BF>100\)
Analysis of parameter estimates on the group level showed that this overall relation was primarily driven by PD+ patients, who showed a strong relation, whereas there was no relation in controls or PD- patients:
Control: \(r_{ au}=.13\), \(p=.288\), \(BF=.39\)
PD+: \(r_{ au}=.45\), \(p<.001\), \(BF>100\)
PD-: \(r_{ au}=-.01\), \(p=.973\), \(BF=.23\)
7.2 Exploration bonus \(\beta\)
The exploration bonus \(\beta\) driving uncertainty-directed correlation was negatively related to performance, suggesting that participants who explore too much at the cost of exploiting known high-value options achieve lower performance:
Overall: \(r_{ au}=-.59\), \(p<.001\), \(BF>100\)
Analysis of parameter estimates on the group level showed that this overall relation was primarily driven by PD+ patients, who showed a strong relation, whereas there was no relation in controls or PD- patients:
Control: \(r_{ au}=-.43\), \(p<.001\), \(BF>100\)
PD+: \(r_{ au}=-.61\), \(p<.001\), \(BF>100\)
PD-: \(r_{ au}=-.60\), \(p<.001\), \(BF>100\)
7.3 Random exploration \(\tau\)
The temperature parameter of the softmax choice rule \(\tau\), representig random exploration, was not related to performance, suggesting that participants who explore too much at the cost of exploiting known high-value options achieve lower performance:
Overall: \(r_{ au}=-.07\), \(p=.308\), \(BF=.22\)
Analysis of parameter estimates on the group level showed that this overall relation was primarily driven by PD+ patients, who showed a strong relation, whereas there was no relation in controls or PD- patients:
Control: \(r_{ au}=-.02\), \(p=.860\), \(BF=.23\)
PD+: \(r_{ au}=.09\), \(p=.451\), \(BF=.30\)
PD-: \(r_{ au}=-.23\), \(p=.077\), \(BF=1.1\)
7.4 Regression: Perfomance and model parameters
We also performed a regression analysis where we included all model paramaters together with group as predictors.
Code
df_params_performance_wide <- df_params_performance %>%filter(ModelName =="GP-UCB") %>%select(id, group, mean_reward, param, estimate_log10) %>%distinct(id, group, param, .keep_all =TRUE) %>%# drop duplicates if anypivot_wider(names_from = param, values_from = estimate_log10) %>%drop_na(beta, tau, lambda) lm_performance_parameters_log <-lm(mean_reward ~ group * (lambda + beta + tau), data = df_params_performance_wide)tab_model(lm_performance_parameters_log)# res.table <- as.data.frame(coef(summary(lm_performance_parameters_log)))# check models# library(performance)# check_model(lm_performance_parameters_log)# # df_params_performance_wide2 <- # df_params_performance %>% # filter(ModelName == "GP-UCB") %>% # select(id, group, mean_reward, param, estimate) %>%# distinct(id, group, param, .keep_all = TRUE) %>% # drop duplicates if any# pivot_wider(names_from = param, values_from = estimate) %>%# drop_na(beta, tau, lambda) # # # lm_performance_parameters <- lm(mean_reward ~ group * (lambda + beta + tau), data = df_params_performance_wide2)# # tab_model(lm_performance_parameters, title = "Regression results: Performance (obtained rewards) as function of group and model parameters.")# # res.table <- as.data.frame(coef(summary(lm_performance_parameters)))# # check_model(lm_performance_parameters)
Table 2: Regression results: Performance (obtained rewards) as function of group and model parameters (log scale).
mean reward
Predictors
Estimates
CI
p
(Intercept)
0.54
0.50 – 0.58
<0.001
group [PD+]
0.10
0.03 – 0.17
0.009
group [Control]
0.08
0.03 – 0.14
0.005
lambda
0.02
-0.20 – 0.24
0.858
beta
-0.02
-0.06 – 0.01
0.206
tau
-0.01
-0.05 – 0.03
0.608
group [PD+] × lambda
0.07
-0.26 – 0.41
0.675
group [Control] × lambda
-0.10
-0.51 – 0.31
0.621
group [PD+] × beta
-0.05
-0.14 – 0.05
0.345
group [Control] × beta
-0.07
-0.15 – 0.01
0.078
group [PD+] × tau
0.03
-0.04 – 0.09
0.447
group [Control] × tau
0.03
-0.06 – 0.12
0.519
Observations
98
R2 / R2 adjusted
0.635 / 0.588
8 Article figure
The following code generates Figure 3 from the article.
GP-UCB model parameters \(\lambda\) (amount of generalization), \(\beta\) (exploration bonus), and \(\tau\) (amount of random exploration) acorss allparticipants
Code
# for now, random intercepts only, Random intercept + random slope not stable# fit models: main effects onlylm_tau_bdi_mmse <-lm(log(tau) ~ group + BDI + MMSE , data = df_params_clinical_indicators)lm_beta_bdi_mmse <-lm(log(beta) ~ group + BDI + MMSE , data = df_params_clinical_indicators)lm_lambda_bdi_mmse <-lm(lambda ~ group + BDI + MMSE , data = df_params_clinical_indicators)tab_model(lm_lambda_bdi_mmse, lm_beta_bdi_mmse, lm_tau_bdi_mmse)
lambda
log(beta)
log(tau)
Predictors
Estimates
CI
p
Estimates
CI
p
Estimates
CI
p
(Intercept)
-0.52
-2.14 – 1.10
0.526
10.81
-1.77 – 23.40
0.091
0.12
-14.16 – 14.40
0.987
group [PD+]
0.01
-0.10 – 0.12
0.893
-1.00
-1.85 – -0.16
0.021
-0.67
-1.63 – 0.29
0.169
group [Control]
0.10
-0.01 – 0.21
0.068
-1.41
-2.24 – -0.57
0.001
-0.62
-1.57 – 0.33
0.199
BDI
0.00
-0.01 – 0.02
0.624
-0.04
-0.13 – 0.06
0.452
-0.01
-0.12 – 0.10
0.817
MMSE
0.04
-0.02 – 0.09
0.198
-0.35
-0.78 – 0.07
0.104
-0.09
-0.57 – 0.40
0.724
Observations
97
97
97
R2 / R2 adjusted
0.060 / 0.020
0.146 / 0.109
0.029 / -0.013
Code
# fit models: main effects + interactions with grouplm_tau_bdi_mmse <-lm(log(tau) ~ group *(BDI + MMSE), data = df_params_clinical_indicators)lm_beta_bdi_mmse <-lm(log(beta) ~ group * (BDI + MMSE), data = df_params_clinical_indicators)lm_lambda_bdi_mmse <-lm(lambda ~ group * (BDI + MMSE), data = df_params_clinical_indicators)tab_model(lm_lambda_bdi_mmse, lm_beta_bdi_mmse, lm_tau_bdi_mmse)
lambda
log(beta)
log(tau)
Predictors
Estimates
CI
p
Estimates
CI
p
Estimates
CI
p
(Intercept)
0.24
-2.85 – 3.32
0.880
3.34
-19.40 – 26.08
0.771
-7.03
-33.87 – 19.82
0.604
group [PD+]
-1.38
-5.74 – 2.99
0.532
2.14
-30.00 – 34.27
0.895
2.68
-35.26 – 40.62
0.889
group [Control]
-0.82
-4.87 – 3.23
0.689
20.50
-9.37 – 50.37
0.176
17.19
-18.08 – 52.45
0.335
BDI
0.00
-0.02 – 0.02
0.925
-0.12
-0.29 – 0.04
0.135
0.00
-0.19 – 0.20
0.973
MMSE
0.01
-0.09 – 0.11
0.844
-0.07
-0.84 – 0.70
0.858
0.16
-0.75 – 1.07
0.732
group [PD+] × BDI
0.01
-0.03 – 0.04
0.726
0.07
-0.17 – 0.30
0.560
-0.07
-0.35 – 0.20
0.598
group [Control] × BDI
-0.00
-0.03 – 0.03
0.984
0.26
0.02 – 0.49
0.031
0.07
-0.21 – 0.34
0.641
group [PD+] × MMSE
0.05
-0.10 – 0.19
0.534
-0.13
-1.21 – 0.95
0.811
-0.10
-1.37 – 1.18
0.881
group [Control] × MMSE
0.03
-0.11 – 0.17
0.646
-0.83
-1.85 – 0.18
0.107
-0.64
-1.84 – 0.57
0.296
Observations
97
97
97
R2 / R2 adjusted
0.066 / -0.019
0.234 / 0.165
0.057 / -0.029
Linear regression with model GP-UCB model paramteres as function of group and depression level (BDI-II) and cognitive functioning (MMSE). All participants
GP-UCB model parameters \(\lambda\) (amount of generalization), \(\beta\) (exploration bonus), and \(\tau\) (amount of random exploration); PD patients only.
Code
# for now, random intercepts only, Random intercept + random slope not stable# fit models: main effects onlylm_tau_bdi_mmse_hy <-lm(log(tau) ~ group + BDI + MMSE + hoehn_yahr,data =subset(df_params_clinical_indicators, group !="Control"))lm_beta_bdi_mmse_hy <-lm(log(beta) ~ group + BDI + MMSE + hoehn_yahr,data =subset(df_params_clinical_indicators, group !="Control"))lm_lambda_bdi_mmse_hy <-lm(lambda ~ group + BDI + MMSE + hoehn_yahr, data =subset(df_params_clinical_indicators, group !="Control"))tab_model(lm_lambda_bdi_mmse_hy, lm_beta_bdi_mmse_hy, lm_tau_bdi_mmse_hy)
lambda
log(beta)
log(tau)
Predictors
Estimates
CI
p
Estimates
CI
p
Estimates
CI
p
(Intercept)
-0.74
-2.88 – 1.39
0.487
7.04
-11.08 – 25.15
0.440
-5.30
-26.64 – 16.04
0.621
group [PD+]
0.01
-0.10 – 0.12
0.853
-1.08
-1.99 – -0.17
0.021
-0.73
-1.80 – 0.35
0.180
BDI
0.00
-0.01 – 0.02
0.688
-0.08
-0.21 – 0.04
0.195
-0.03
-0.18 – 0.12
0.676
MMSE
0.04
-0.03 – 0.11
0.256
-0.19
-0.79 – 0.42
0.542
0.11
-0.61 – 0.82
0.765
hoehn yahr
0.05
-0.04 – 0.13
0.282
-0.35
-1.07 – 0.36
0.329
0.01
-0.83 – 0.85
0.980
Observations
64
64
64
R2 / R2 adjusted
0.037 / -0.028
0.137 / 0.079
0.036 / -0.029
Code
# fit models: main effects and interactionslm_tau_bdi_mmse_hy <-lm(log(tau) ~ group * (BDI + MMSE + hoehn_yahr),data =subset(df_params_clinical_indicators, group !="Control"))lm_beta_bdi_mmse_hy <-lm(log(beta) ~ group * (BDI + MMSE + hoehn_yahr),data =subset(df_params_clinical_indicators, group !="Control"))lm_lambda_bdi_mmse_hy <-lm(lambda ~ group * (BDI + MMSE + hoehn_yahr), data =subset(df_params_clinical_indicators, group !="Control"))tab_model(lm_lambda_bdi_mmse_hy, lm_beta_bdi_mmse_hy, lm_tau_bdi_mmse_hy)
lambda
log(beta)
log(tau)
Predictors
Estimates
CI
p
Estimates
CI
p
Estimates
CI
p
(Intercept)
-0.69
-3.62 – 2.24
0.640
6.50
-19.80 – 32.80
0.622
-10.59
-41.20 – 20.03
0.491
group [PD+]
-0.11
-4.25 – 4.04
0.959
0.07
-37.12 – 37.25
0.997
9.70
-33.59 – 52.99
0.655
BDI
-0.00
-0.02 – 0.02
0.782
-0.11
-0.29 – 0.07
0.237
-0.01
-0.23 – 0.20
0.913
MMSE
0.03
-0.06 – 0.13
0.501
-0.15
-1.02 – 0.73
0.738
0.24
-0.78 – 1.27
0.633
hoehn yahr
0.15
0.04 – 0.27
0.011
-0.53
-1.58 – 0.52
0.320
0.59
-0.63 – 1.82
0.335
group [PD+] × BDI
0.01
-0.02 – 0.04
0.494
0.06
-0.21 – 0.32
0.664
-0.05
-0.36 – 0.26
0.734
group [PD+] × MMSE
0.01
-0.12 – 0.15
0.829
-0.08
-1.32 – 1.15
0.897
-0.27
-1.71 – 1.17
0.709
group [PD+] × hoehn yahr
-0.21
-0.37 – -0.05
0.013
0.36
-1.11 – 1.82
0.628
-1.14
-2.84 – 0.56
0.185
Observations
64
64
64
R2 / R2 adjusted
0.145 / 0.039
0.148 / 0.041
0.070 / -0.046
Linear regression with model GP-UCB model paramteres as function of group and depression level (BDI-II) and cognitive functioning (MMSE). PD patients only.
10 Model params of participants best explained by GP-UCB model
For this analysis we only consider participants who were best explained by the GP-UCB model. The results are consistent with the same analyses performed with the full sample above: no substantial differences in amount of generalization \(\lambda\), marked differences in terms of the exploration bonus \(\beta\), and no differences in terms of random exploration \(\tau\). The only difference is that we found a difference between the control and off-medication group in the extent of generalization when using the full sample, whereas we found no difference when only considering the subset of participants best accounted for by the GP-UCB model.
10.0.1 Generalization \(\lambda\)
The parameter \(\lambda\) represents the length-scale in the RBF kernel, which governs the amount of generalization, i.e., to what extent participants assume a spatial correlation between options (higher \(\lambda\) = stronger generalization). Overall, the amount of generalization was very similar between groups.
Control vs. PD+: \(U=210\), \(p=.402\), \(r_{ au}=.12\), \(BF=.44\)
Control vs. PD-: \(U=219\), \(p=.150\), \(r_{ au}=.20\), \(BF=.63\)
PD+ vs. PD-: \(U=191\), \(p=.558\), \(r_{ au}=.08\), \(BF=.34\)
10.0.2 Exploration bonus \(\beta\)
The parameter \(\beta\) represents the uncertainty bonus, i.e. how much expected rewards are positively inflated by their uncertainty (higher \(\beta\) = more uncertainty-directed exploration). Controls and PD+ patients on medication did not differ, and both groups had lower beta estimates than the dopamine-depleted patients in the PD− group. These differences suggest that levodopa medication modulated the amount of uncertainty-directed exploration by restoring beta to levels comparable to those observed in controls without PD. This aligns with findings from a restless bandit paradigm, where L-Dopa reduced the amount of directed exploration in healthy volunteers, while the level of random exploration remained unaffected (Chakroun et al., 2020).
Control vs. PD+: \(U=182\), \(p=.977\), \(r_{ au}=.01\), \(BF=.32\)
Control vs. PD-: \(U=55\), \(p<.001\), \(r_{ au}=-.49\), \(BF=14\)
PD+ vs. PD-: \(U=54\), \(p<.001\), \(r_{ au}=-.49\), \(BF=31\)
10.0.3 Random exploration \(\tau\)
The parameter \(\tau\) represents the amount of decision noise, i.e. stochastic variability in the softmax decision rule (lower \(\tau\) = more decision noise, i.e. more uniform distribution; conversely, \(\tau \rightarrow \infty \quad \Rightarrow \quad \text{argmax (greedy)}\)). There were no group differences in rge temperature paramter \(\tau\), indicating comparable amounts of random exploration regardless of group.
Control vs. PD+: \(U=193\), \(p=.729\), \(r_{ au}=.05\), \(BF=.35\)
Control vs. PD-: \(U=162\), \(p=.799\), \(r_{ au}=-.04\), \(BF=.33\)
PD+ vs. PD-: \(U=140\), \(p=.358\), \(r_{ au}=-.13\), \(BF=.44\)
Figure 5: Parameter estimates of GP-UCB model, estimated through leave-one-round-out cross validation. Each dot is one participant. Only participants are included who were best described by the GP-UCB model.
Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov), 397–422.
Chakroun, K., Mathar, D., Wiehler, A., Ganzer, F., & Peters, J. (2020). Dopaminergic modulation of the exploration/exploitation trade-off in human decision-making. eLife, 9, e51260. https://doi.org/10.7554/eLife.51260
Dayan, P., Kakade, S., & Montague, P. R. (2000). Learning and selective attention. Nature Neuroscience, 3(11), 1218–1223.
Djamshidian, A., O’Sullivan, S. S., Wittmann, B. C., Lees, A. J., & Averbeck, B. B. (2011). Novelty seeking behaviour in parkinson’s disease. Neuropsychologia, 49(9), 2483–2488. https://doi.org/10.1016/j.neuropsychologia.2011.04.026
Doorn, J. van, Ly, A., Marsman, M., & Wagenmakers, E.-J. (2018). Bayesian inference for kendall’s rank correlation coefficient. The American Statistician, 72, 303–308.
Doorn, J. van, Ly, A., Marsman, M., & Wagenmakers, E.-J. (2020). Bayesian rank-based hypothesis testing for the rank sum test, the signed rank test, and spearman’s \(\rho\). Journal of Applied Statistics, 47(16), 2984–3006.
Gershman, S. J. (2015). A unifying probabilistic view of associative learning. PLoS Comput Biol, 11(11), e1004567.
Gilmour, W., Mackenzie, G., Feile, M., Tayler-Grint, L., Suveges, S., Macfarlane, J. A., Macleod, A. D., Marshall, V., Grunwald, I. Q., Steele, J. D., et al. (2024). Impaired value-based decision-making in parkinson’s disease apathy. Brain, 147(4), 1362–1376.
Giron, A. P., Ciranka, S., Schulz, E., Bos, W. van den, Ruggeri, A., Meder, B., & Wu, C. M. (2023). Developmental changes in exploration resemble stochastic optimization. Nature Human Behaviour, 7(11), 1955–1967. https://doi.org/ ( )
Meder, B., Wu, C. M., Schulz, E., & Ruggeri, A. (2021). Development of directed and random exploration in children. Developmental Science, 24(4), e13095. https://doi.org/https://doi.org/10.1111/desc.13095
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. Classical Conditioning II: Current Research and Theory, 2, 64–99.
Rigoux, L., Stephan, K. E., Friston, K. J., & Daunizeau, J. (2014). Bayesian model selection for group studies—revisited. Neuroimage, 84, 971–985.
Schulz, E., Speekenbrink, M., & Krause, A. (2018). A tutorial on gaussian process regression: Modelling, exploring, and exploiting functions. Journal of Mathematical Psychology, 85, 1–16.
Schulz, E., Wu, C. M., Ruggeri, A., & Meder, B. (2019). Searching for rewards like a child means less generalization and more directed exploration. Psychological Science, 30(11), 1561–1572. https://doi.org/10.1177/0956797619863663
Seymour, B., Barbe, M., Dayan, P., Shiner, T., Dolan, R., & Fink, G. R. (2016). Deep brain stimulation of the subthalamic nucleus modulates sensitivity to decision outcome value in parkinson’s disease. Scientific Reports, 6(1), 32509.
Shohamy, D., Myers, C. E., Geghman, K. D., Sage, J., & Gluck, M. A. (2006). L-dopa impairs learning, but spares generalization, in parkinson’s disease. Neuropsychologia, 44(5), 774–784.
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3), 279–292.
Williams, C. K., & Rasmussen, C. E. (2006). Gaussian processes for machine learning. MIT Press Cambridge, MA.
Witt, A., Toyokawa, W., Lala, K. N., Gaissmaier, W., & Wu, C. M. (2024). Humans flexibly integrate social information despite interindividual differences in reward. Proceedings of the National Academy of Sciences, 121(39), e2404928121. https://doi.org/10.1073/pnas.2404928121
Wu, C. M., Meder, B., & Schulz, E. (2025). Unifying principles of generalization: Past, present, and future. Annual Review of Psychology, 76, 275–302. https://doi.org/https://doi.org/10.1146/annurev-psych-021524-110810
Wu, C. M., Schulz, E., Garvert, M. M., Meder, B., & Schuck, N. W. (2020). Similarities and differences in spatial and non-spatial cognitive maps. PLOS Computational Biology, 16(9), e1008149. https://doi.org/10.1371/journal.pcbi.1008149
Wu, C. M., Schulz, E., Pleskac, T. J., & Speekenbrink, M. (2022). Time pressure changes how people explore and respond to uncertainty. Scientific Reports, 12, 1–14. https://doi.org/https://doi.org/10.1038/s41598-022-07901-1
Wu, C. M., Schulz, E., Speekenbrink, M., Nelson, J. D., & Meder, B. (2018). Generalization guides human exploration in vast decision spaces. Nature Human Behaviour, 2, 915–924. https://doi.org/10.1038/s41562-018-0467-4